Using Document Structure on Retrieving Webpages at the Web-CLEF 2006

نویسندگان

  • Syntia Wijaya
  • Bimo Widhi
  • Tommy Khoerniawan
  • Mirna Adriani
چکیده

We present a report on our participation in the mixed monolingual web task of the 2006 Cross-Language Evaluation Forum (CLEF). We compared the result of web page retrieval based on the page content, page title, and anchor page. The retrieval effectiveness for the combination of page content, page title, and anchor texts was better than that of the combination of page title and page title only. Applying the pseudo-relevance feedback improved the retrieval performance of the queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

University of Glasgow at CLEF 2013: Experiments in eHealth Task 3 with Terrier

In our participation in the CLEF 2013 eHealth task 3, we investigate (1) the effectiveness of our Divergence from Randomness (DFR) framework on retrieving medical webpages, (2) the adoption of classical pseudo-relevance feedback for improving the representation of the queries, and (3) the exploitation of a collection enrichment technique for alleviating the mismatches between the terms in docum...

متن کامل

Knowledge extraction from webpages

This article presents a system to extract Knowledge from webpages by producing semantic annotations. taking into account semantic information from the domain to annotate an element in a webpage implies solving two problems : (1) identifying the syntactic structure of this element in the webpage and (2) identifying the most specific concept (in terms of subsumption) of the ontology that will be ...

متن کامل

Query-Structure Based Web Page Indexing

Indexing is a crucial technique for dealing with the massive amount of data present on the web. In our third participation in the web track at TREC 2012, we explore the idea of building an efficient query-based indexing system over Web page collection. Our prototype explores the trends in user queries and consequently indexes texts using particular attributes available in the documents. This pa...

متن کامل

LIMSI @ CLEF eHealth 2015 - task 2

This paper presents LIMSI’s participation in the User-Centered Health Information Retrieval task (task 2) at the CLEF eHealth 2015 workshop[5]. In our contribution we explored two different strategies to query expansion, i.e. one based on entity recognition using MetaMap[1] and the UMLS[3], and a second strategy based on disease hypothesis generation using self-constructed external resources su...

متن کامل

Collecting and Organizing Web Content

To collect and organize Web content today a user must make bookmarks, print whole webpages, or copy and paste pieces of webpages into a document. We present a framework for assisting the user in managing personal collections of Web content. The user interactively selects the webpage elements of interest, and the system builds an extraction pattern for those elements that is used to automaticall...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006